7 research outputs found
CMIR-NET : A Deep Learning Based Model For Cross-Modal Retrieval In Remote Sensing
We address the problem of cross-modal information retrieval in the domain of
remote sensing. In particular, we are interested in two application scenarios:
i) cross-modal retrieval between panchromatic (PAN) and multi-spectral imagery,
and ii) multi-label image retrieval between very high resolution (VHR) images
and speech based label annotations. Notice that these multi-modal retrieval
scenarios are more challenging than the traditional uni-modal retrieval
approaches given the inherent differences in distributions between the
modalities. However, with the growing availability of multi-source remote
sensing data and the scarcity of enough semantic annotations, the task of
multi-modal retrieval has recently become extremely important. In this regard,
we propose a novel deep neural network based architecture which is considered
to learn a discriminative shared feature space for all the input modalities,
suitable for semantically coherent information retrieval. Extensive experiments
are carried out on the benchmark large-scale PAN - multi-spectral DSRSID
dataset and the multi-label UC-Merced dataset. Together with the Merced
dataset, we generate a corpus of speech signals corresponding to the labels.
Superior performance with respect to the current state-of-the-art is observed
in all the cases
Zero-Shot Sketch Based Image Retrieval using Graph Transformer
The performance of a zero-shot sketch-based image retrieval (ZS-SBIR) task is
primarily affected by two challenges. The substantial domain gap between image
and sketch features needs to be bridged, while at the same time the side
information has to be chosen tactfully. Existing literature has shown that
varying the semantic side information greatly affects the performance of
ZS-SBIR. To this end, we propose a novel graph transformer based zero-shot
sketch-based image retrieval (GTZSR) framework for solving ZS-SBIR tasks which
uses a novel graph transformer to preserve the topology of the classes in the
semantic space and propagates the context-graph of the classes within the
embedding features of the visual space. To bridge the domain gap between the
visual features, we propose minimizing the Wasserstein distance between images
and sketches in a learned domain-shared space. We also propose a novel
compatibility loss that further aligns the two visual domains by bridging the
domain gap of one class with respect to the domain gap of all other classes in
the training set. Experimental results obtained on the extended Sketchy,
TU-Berlin, and QuickDraw datasets exhibit sharp improvements over the existing
state-of-the-art methods in both ZS-SBIR and generalized ZS-SBIR.Comment: Accepted at ICPR 202
CrossATNet - a novel cross-attention based framework for sketch-based image retrieval
We propose a novel framework for cross-modal zero-shot learning (ZSL) in the context of sketch-based image retrieval (SBIR). Conventionally, the SBIR schema mainly considers simultaneous mappings among the two image views and the semantic side information. Therefore, it is desirable to consider fine-grained classes mainly in the sketch domain using highly discriminative and semantically rich feature space. However, the existing deep generative modeling based SBIR approaches majorly focus on bridging the gaps between the seen and unseen classes by generating pseudo-unseen-class samples. Besides, violating the ZSL protocol by not utilizing any unseen-class information during training, such techniques do not pay explicit attention to modeling the discriminative nature of the shared space. Also, we note that learning a unified feature space for both the multi-view visual data is a tedious task considering the significant domain difference between sketches and the color images. In this respect, as a remedy, we introduce a novel framework for zero-shot SBIR. While we define a cross-modal triplet loss to ensure the discriminative nature of the shared space, an innovative cross-modal attention learning strategy is also proposed to guide feature extraction from the image domain exploiting information from the respective sketch counterpart. In order to preserve the semantic consistency of the shared space, we consider a graph CNN based module which propagates the semantic class topology to the shared space. To ensure an improved response time during inference, we further explore the possibility of representing the shared space in terms of hash-codes. Experimental results obtained on the benchmark TU-Berlin and the Sketchy datasets confirm the superiority of CrossATNet in yielding the state-of-the-art results
A Simplified Framework for Zero-shot Cross-Modal Sketch Data Retrieval
We deal with the problem of zero-shot cross-modal imageretrieval involving color and sketch images through a noveldeep representation learning technique. The problem of asketch to image retrieval and vice-versa is of practical im-portance, and a trained model in this respect is expectedto generalize beyond the training classes, e.g., the zero-shot learning scenario. Nonetheless, considering the dras-tic distributions-gap between both the modalities, a fea-ture alignment is necessary to learn a shared feature spacewhere retrieval can efficiently be carried out. Additionally,it should also be guaranteed that the shared space is se-mantically meaningful to aid in the zero-shot retrieval task.The very few existing techniques for zero-shot sketch-RGBimage retrieval extend the deep generative models for learn-ing the embedding space; however, training a typical GANlike model for multi-modal image data may be non-trivialat times. To this end, we propose a multi-stream encoder-decoder model that simultaneously ensures improved map-ping between the RGB and sketch image spaces and highdiscrimination in the shared semantics-driven encoded fea-ture space. Further, it is guaranteed that the class topologyof the original semantic space is preserved in the encodedfeature space, which subsequently reduces the model biastowards the training classes. Experimental results obtainedon the benchmark Sketchy and TU-Berlin datasets estab-lish the efficacy of our model as we outperform the existingstate-of-the-art techniques by a considerable margin
Synergistic Use of TanDEM-X and Landsat-8 Data for Crop-Type Classification and Monitoring
Classification of crop types using Earth Observation (EO) data is a challenging task. The challenge increases many folds when we have diverse crops within a resolution cell. In this regard, optical and Synthetic Aperture Radar (SAR) data provide complementary information to characterize a target. Therefore, we propose to leverage the synergy between multispectral and Synthetic Aperture Radar (SAR) data for crop classification. We aim to use the newly developed model-free three-component scattering power components to quantify changes in scattering mechanisms at different phenological stages. By incorporating interferometric coherence information, we consider the morphological characteristics of the crops that are not available with only polarimetric information. We also utilize the reflectance values from Landsat-8 spectral bands as complementary biochemical information of crops. The classification accuracy is enhanced by using these two pieces of information combined using a neural network-based architecture with an attention mechanism. We utilize the time series dual co-polarimetric (i.e., HH–VV) TanDEM-X SAR data and the multispectral Landsat-8 data acquired over an agricultural area in Seville, Spain. The use of the proposed attention mechanism for fusing SAR and optical data shows a significant improvement in classification accuracy by 6.0% to 9.0% as compared to the sole use of either the optical or SAR data. Besides, we also demonstrate that the utilization of single-pass interferometric coherence maps in the fusion framework enhances the overall classification accuracy by ≈ 3.0%. Therefore, the proposed synergistic approach will facilitate accurate and robust crop mapping with high-resolution EO data at larger scales.This work was supported in part by the German Aerospace Center (DLR) which provided all the TanDEM-X data under project POLI6736, in part by the State Research Agency (AEI), in part by the Spanish Ministry of Science and Innovation, and in part by the EU EFDR funds under Project TEC2017-85244-C2-1-P. The work of N. Bhogapurapu and S. Dey was supported by the Ministry of Education (formerly Ministry of Human Resource and Development-MHRD), Government of India